Multi–Domain Learning: Analysis, and Methods for Multi–Attribute Domains
نویسندگان
چکیده
A common assumption in many machine learning techniques is that the data points are independent and identically distributed (i.i.d.). However, often data can be divided into subgroups of data points that are related in some way, and this violates the assumption that they are identically distributed. Such subgroups are commonly referred to as domains or subpopulations. Domain information can be used to learn beer machine learning models, and multi–domain learning techniques provide one way of using domain information in data. An important question when using multi–domain learning techniques is that of defining how a given dataset is divided into domains. Often some metadata aribute associated with the instances is used for defining domains. In this thesis, we consider the impact of the definition of domains on multi–domain learning, and propose approaches that can handle the case where domains can be defined for a given dataset in more than one way. We first present an empirical analysis of existing multi–domain learning methods, with the aim of understanding how the definition and properties of domains influence their performance. We show that the performance of multi–domain learning techniques can be affected by two factors: (i) an ensemble learning effect due to classifier combination; and (ii) the distribution of class labels across the different domains. We then show that it is possible to design a problem–driven approach to multi–domain learning. We propose a feature representation that is motivated by knowledge about the domains available in the data. Our feature representation explicitly accounts for the structural similarity among syntactic features across multiple domains, even when the domains can be defined in more than one way. Finally, we present learning methods that go beyond the current multi– domain learning paradigm, which assumes a single way of dividing the data into domains. For many text classification tasks, multiple metadata aributes associated with the text can influence the behavior of textual features as well as the performance on the task. e different metadata aributes can have varying utility for the purpose of defining domains for multi–domain learning. Choosing a single metadata aribute to define domains in such cases may not be optimal. We propose methods that allow the use of multiple metadata aributes for defining domains, leading to beer models that are still efficient. Anowledgments e pursuit of a Ph.D. is a challenging1 journey. However, it is made enjoyable by the many souls that touch upon the life of an aspiring Ph.D. in numerous ways. is is my heartfelt aempt to express gratitude to all those wonderful folks who have guided and supported me in my journey. I am very grateful to my thesis advisors: Carolyn and William. Carolyn has been my advisor since my Master’s days at Carnegie Mellon, and has patiently guided me throughout the years, despite my at–times–wild research ideas. She gaveme the freedom to pursuemy research interests, while also bringingme back on track when the focus of my work seemed diluted. She also brought to my work the much– needed linguistic perspective that is valuable in thinking about any problem in natural language processing. She has taught me to think deeply about research problems, and to always meaningfully question and challenge one’s own work. William has been the wise sage on my commiee — with his calm demeanor, and very thoughtful advise on all maers, technical or non–technical. I am very thankful to him that he agreed to be my co–advisor starting Fall of 2010, when I needed a core machine learning perspective for my work going forward. He has taught me that seemingly small things can maer in research, and therefore no thing is too small to pay aention to when doing research. My thesis commiee members, Noah A. Smith and, in particular, Mark Dredze, have played a very significant role in shaping my dissertation work. Noah has always amazed me with his quickness in grasping the core of whatever I described to him (and my descriptions got prey verbose at times), despite long periods of time between our meetings. His energy is contagious, and I have always come away from his office feeling enthusiastic about my work. I have learned a lot from my interactions with him, including my collaboration with him on work in text–driven forecasting. Mark is without a doubt the best “external” thesis commiee member that one could hope for, and a lot more than that. I doubt if anyone else in that role would even imagine being on a phone call at 1 a.m. before a paper deadline, discussing paper edits, and the best way to present results in order to make a point. He has been almost like a third advisor to me starting from the time I did my thesis proposal. He has always asked me the most penetrating questions, and guided me patiently in finding out the answers. Prior to coming to Carnegie Mellon, my thesis advisors at the University of Minnesota Duluth, Richard Maclin and Ted Pedersen were instrumental in igniting my interest in machine learning and natural language processing. I am fortunate that I got to work with them at the time, and it paved my way to Carnegie Mellon. Many amazing teachers have influenced my academic pursuit starting frommy school years, and I am thankful to all of them. I would like to particularly mention Mrs. Mitali Chaudhury, Mrs. Mokashi, Mrs. Nazare, Mr. Rajput, and Ms. Rose from my school years; Mr. Gadgil, and Mr. Jadhav from my high school years; and Mr. Kajave, and Mr. 1Some have said it is harder than having a baby. However, since I cannot ever experience (thankfully!!) the physical ordeal involved in having a baby, I will skip that comparison.
منابع مشابه
What's in a Domain? Multi-Domain Learning for Multi-Attribute Data
Multi-Domain learning assumes that a single metadata attribute is used in order to divide the data into so-called domains. However, real-world datasets often have multiple metadata attributes that can divide the data into domains. It is not always apparent which single attribute will lead to the best domains, and more than one attribute might impact classification. We propose extensions to two ...
متن کاملA Comparative Study of Multi-Attribute Continuous Double Auction Mechanisms
Auctions have been as a competitive method of buying and selling valuable or rare items for a long time. Single-sided auctions in which participants negotiate on a single attribute (e.g. price) are very popular. Double auctions and negotiation on multiple attributes create more advantages compared to single-sided and single-attribute auctions. Nonetheless, this adds the complexity of the auctio...
متن کاملSensitivity Analysis of Simple Additive Weighting Method (SAW): The Results of Change in the Weight of One Attribute on the Final Ranking of Alternatives
Most of data in a multi-attribute decision making (MADM) problem are unstable and changeable, then sensitivity analysis after problem solving can effectively contribute to making accurate decisions. This paper provides a new method for sensitivity analysis of MADM problems so that by using it and changing the weights of attributes, one can determine changes in the final results of a decision ma...
متن کاملAutomatic Domain Partitioning for Multi-Domain Learning
Multi-Domain learning (MDL) assumes that the domain labels in the dataset are known. However, when there are multiple metadata attributes available, it is not always straightforward to select a single best attribute for domain partition, and it is possible that combining more than one metadata attributes (including continuous attributes) can lead to better MDL performance. In this work, we prop...
متن کاملSensitivity Analysis in the QUALIFLEX and VIKOR Methods
The sensitivity analysis for multi-attribute decision making (MADM) problems is important for two reasons: First, the decision matrix as the source of the results of a decision problem is inaccurate because it sorts the alternatives in each criterion inaccurately. Second, the decision maker may change his opinions in a time period because of changes in the importance of the criteria and in the ...
متن کاملSensitivity Analysis of TOPSIS Technique: The Results of Change in the Weight of One Attribute on the Final Ranking of Alternatives
Most of data in Multi-attribute decision making (MADM) problems are changeable rather than constant and stable. Therefore, sensitivity analysis after problem solving can effectively contribute to making accurate decisions. In this paper, we offer a new method for sensitivity analysis in multi-attribute decision making problems in which if the weights of one attribute changes, then we can dete...
متن کامل